-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Array Based Dynamic Graph #135
Conversation
@@ -61,4 +61,6 @@ object FastUtilUtils { | |||
|
|||
def newInt2IntOpenHashMap(): mutable.Map[Int, Int] = | |||
new Int2IntOpenHashMap().asInstanceOf[jutil.Map[Int, Int]] | |||
|
|||
def intArrayListToSeq(list: IntArrayList): Seq[Int] = list map { _.toInt } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list.toIntArray() will also work, but this is fine too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will be O(n) btw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to use this at all (see my previous comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about O(n)! [edit: I'm now using the IndexedSeq wrapper you propose below, so it should be O(1)]
Consider adding benchmark in cassovary-benchmark subproject as well to demo the performance of this. |
// outboundLists(id) contains the outbound neighbors of the given id, | ||
// or null if the id is not in this graph. | ||
// If we aren't storing outbound neighbors, outboundLists will always remain size 0. | ||
private val outboundLists = new ArrayBuffer[IntArrayList] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also consider using http://fastutil.di.unimi.it/docs/it/unimi/dsi/fastutil/objects/ObjectArrayList.html instead ArrayBuffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it is more efficient than ArrayBuffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea.
I probably won't get to the benchmark for a few days, but I'll post once I have it, and I'll make a call then about multi-threading. |
ok regarding benchmark |
I needed a mutable undirected graph, so I just added code to support undirected graphs (i.e. stored direction Mutual). |
I actually didn't realize until today that "Mutual" meant undirected and was starting to write my own UndirectedGraph wrapper class when I noticed it. Should I add a comment to StoredGraphDir along the lines of |
LGTM. Yes, Mutual has that intention, but it will be better for there to be a barebones UndirectedGraphWrapper class for readability and to avoid the surprise you ran into. |
A single threaded dynamic graph implementation that keeps nodes and each node's adjacencies in native arrays.
I've implemented a dynamic directed graph, using an ArrayBuffer of IntArrayList (from the fastutil library). If n nodes are used, O(n) objects are created, independent of the number of edges.
Comparison to the current class: Note that the current dynamic graph class SynchronizedDynamicGraph uses a ConcurrentHashMap to store nodes, which has more overhead than an ArrayBuffer when the nodes are mostly sequential. Also, each node stores an ArrayBuffer[Int], which I belive boxes the ints into objects, creating overhead relative to fastutil's IntArrayList. It also has non-trivial synchronization overhead; in particular when iterating over the neighbors of a node, the neighbors are first copied into a new array, then an iterator to the new array is returned. The current class is better when the node Ids are very non-sequential or when automatic synchronization is needed, but otherwise I believe the new class is more efficient. In a very informal comparison on a graph with several hundred million edges, the currrent class wouldn't load using 30GB of heap, while the new class fit the graph in 7.3 GB of RAM.
@pankajgupta Could you take a look at this, or point me to someone else?
Thanks!